Christina's LIS Rant: 04/01/2009

Christina's LIS Rant

Sunday, April 26, 2009

Comps readings this week

Diffusion of innovations questions figured prominently in the folder of comps questions - seemed like nearly everyone had a question relating this area to another area, so this finishes up the readings I had on diffusion of innovations. (this post was added to throughout the week and finished after the post on comps preparations. there will probably still be some posts on comps readings, but I'm supposed to be doing more integrating now - and not in the fun math way!)

Ilie, V., Van Slyke, C., Green, G., & Lou, H. (2005). Gender Differences in Perceptions and Use of Communication Technologies: A Diffusion of Innovation Approach. Information Resources Management Journal, 18(3), 13-31

user's perceptions of technology influence intention to use the technology >
user perceptions differ by gender

This article looks at how gender impacts "intention to use" IM. The standard Rogers things: perceived ease of use, relative advantage, compatibility, observability (broken down into visibility and results demonstrability), plus perceived critical mass. Note that all of these have been studied by people other than Rogers, and they are all "perceived" - it's not Gartner's assessment of relative advantage, it's what the user thinks. They also review the extensive literature on gender differences in communication- both in person and online. The participants were business students who were of course heavy users of icts. They did a survey, and used scales from other studies for most items and made their own for perceived critical mass. I'll leave the stats to anyone who's interested to read. The men were into relative advantage,semonstrability , and perceived critical mass. Women were into ease of use and visibility. This matched with the hypotheses.

Venkatesh, V., Morris, M. G., Davis, G. B., & Davis, F. D. (2003). User acceptance of information technology: Toward a unified view. MIS Quarterly, 27(3), 425-478
MIS Quarterly articles are pretty meaty. This one seems even more so - jam packed with information. It's been cited about 520 times, 387 times from journals (from WoS).

This article compares 8 models of IT acceptance research and comes up with a unified model. The original 8 models were tested to see how much variance they explained, then the new model tested against the first set of data (improvement) and then the new model against new data (decent adjusted R^2).
The 8 models:

Theory of Reasoned Action (TRA)
- attitude toward behavior
- subjective norm

Technology Acceptance Model (TAM)
- perceived usefulness
- perceived ease of use
- subjective norm (added as TAM2)

Motivational Model(MM)
- extrinsic motivation
- intrinsic motivation

Theory of Planned Behavior (TPB)
- attitude toward behavior
- subjective norm
- perceived behavioral control

Combined TAM and TPB
- attitude toward behavior
- subjective norm
- perceived behavioral control
- perceived usefulness

Model of PC Utilization
- job-fit
- complexity
- long-term consequences
- affect towards use
- social factors
- facilitating conditions

Innovation Diffusion Theory (not really strictly Rogers, more like various IS takes on Rogers)
- relative advantage
- ease of use
- image (if use enhances user's image)
- visibility
- compatibility
- results demonstrability
- voluntariness of use

Social Cognitive Theory
- outcome expectations-performance
- outcome expectations-personal
- self-efficacy
- affect (liking the behavior)
- anxiety

For each of these things, there are "moderators" including experience, voluntariness, gender, age. There might also be things about the industry, job function of user, and things about the technology itself - but these aren't considered here.

Some of the limitations of previous studies were that they were mostly done with students, they were done retrospectively, they were done with completely voluntary innovations (but managers need to know how to get employees going), and the technologies were fairly simple.

The authors find 4 organizations in 4 different industries with samples drawn from different functional roles, and give a survey immediately after training on a new technology, a month later, and 3 months later. They also gather "duration of use" for 6 months after the training to look at actual usage. The questions are taken from scales from the studies supporting the 8models and were pre-tested and tweaked... then follows a lot of statistics and testing for validity, reliability... etc. The 4 orgs were grouped into 1a + 1b (voluntary) and 2a + 2b (mandatory). Essentially the older models each accounted for at most between 40-50% of the variance in intention and usage.

The new model (for fear of getting in trouble, I'm re-drawing)

Each of these pieces is pulled together from pieces of the 8 models - so they use the questions from the other models, and group them together this way. They did the standard tests to see if they hung together and tested some hypotheses about the moderators. They then got data from 2 more orgs and ran the new model against that. It essentially worked pretty well. The new model accounted for like 70 percent of the variance of usage intention. There are definitely some limitations - partially due to sample size and the shear number of variables.

Labels: comps

¶ 10:40 PM| (1) comments |cites (technorati) |

Comps readings this week

Joho, H., & Jose, J. M. (2006). Slicing and dicing the information space using local contexts. IIiX: Proceedings of the 1st international conference on Information interaction in context, Copenhagen, Denmark. 66-74. (available from: http://eprints.gla.ac.uk/3521/1/slice_and_dice..pdf)
In this article they test a couple of different things about the information interaction in search. They look at having a workspace in the interface and pseudo-facets by co-occurrence (not the typical clustering). There were several tasks of low and high complexity - defined as how much information is given and needed about an imposed task. Participants were much happier with the workspace than the baseline layout and they also did better at identifying relevant pages using the workspace for complex tasks.

Wacholder, N., & Liu, L. (2008). Assessing term effectiveness in the interactive information access process. Information Processing & Management, 44(3), 1022-1031.
Started reading this and then I took a detour to quickly read through: Wacholder, N., & Liu, L. (2006). User preference: A measure of query-term quality. Journal of the American Society for Information Science and Technology, 57(12), 1566-1580. doi:10.1002/asi.20315 - that article describes the experimental setup.
I just am having a really hard time telling the difference between these two articles. I guess the JASIST article is about what the user prefers and the IP&M article is about how effective these are at retrieving the correct result. The set up is that there's an electronic edition of the book. The investigators create a bunch of questions that can be answered with it. They have 3 indexes - the back of the book and two ways of doing noun phrases. One way keeps 2 phrases if they have the second term in common and the other keeps a phrase if the same word appears as the head of 2 or more phrases. They had questions that were easier or harder and created a test interface to show the query terms to the user. The user selects one and can see a bit of the text, which they can cut and paste or type into the answer block. Users preferred human terms - not surprising. The head sorting terms had a slight edge on the human terms for effectiveness with the TEC terms not doing nearly as well.

White, M. D. (1998). Questions in reference interviews. Journal of Documentation, 54, 443-465.
Looked at 12 pre-search interviews (recall in this time period when you wanted to do a literature search using an electronic database, you filled out a form, then made an appointment with a librarian, and then she mailed you a set of citations - or you picked them up a few days later). These interviews would be after the librarian had reviewed the form but before she'd done any real searching. Out of these 12 interviews, there were 600 questions (from both sides) using apparently a common set of rules as to what is a question... None of this seems earth shattering now. Oh well.

Lee, J. H., Renear, A., & Smith, L. C. (2006). Known-Item Search: Variations on a Concept. Proceedings 69th Annual Meeting of the American Society for Information Science and Technology (ASIST), Austin, TX. , 43. Also available from : http://eprints.rclis.org/8353/
We always talk about known item search, but everyone defines it differently...

Green, R. (1995). Topical Relevance Relationships I. Why Topic Matching Fails. Journal of the American Society for Information Science, 46(9), 646-653.
There are ideal information retrieval systems goals and operational system design. Ideally, relevance, in a strong sense, means that the document/information retrieved helps the user with his or her information need. To get this done in systems, we make some assumptions. Namely, need can be represented by terms, documents can be represented by terms, that the system can retrieve documents based on input terms. So the weaker version of relevance that we use is matching term to term. But there are lots of things that are helpful or relevant that don't match term for term - like things that are up or down the hierarchy (you search for em radiation, microwave thing not returned even though it is a specific type of). She then goes wayyy into linguistics stuff (as is her specialty) about types of relationships...

Huang, X., & Soergel, D. (2006). An evidence perspective on topical relevance types and its implications for exploratory and task-based retrieval. Information Research, 12(1), paper 281. Retrieved from http://informationr.net/ir/12-1/paper281.html
This article follows closely on the previous (if not temporally then topically - ha!). The authors used relevance assessments from the MALACH project to further define various topical relevance types. The MALACH project has oral histories from Holocaust survivors. Graduate students in history assessed segments for matching with given topics and then provided their reasoning for doing so.
Direct - says precisely what the person asked
Indirect - provides evidence so that you can infer the answer
types within these
- generic - at the point but missing a piece
- backward inference or abduction - you have the result or a later event and you can infer what happened before
- forward inference or deduction - you have preceding event or cause
- from cases
Context - provides context for the topic like environmental, social, or cultural setting
Comparison - provides similar information about another person, or another time, or another place

So you can see how these are all very important and how a good exploratory search would help with this. As it is now, you have to manually figure out all of the various things to look for - even if the system perfectly matches your query terms, it's not enough! Also, they discuss the importance if you're trying to build an argument, how you need different types of evidence at different stages. Good stuff (and not just 'cause colleague and advisor as authors)
(so there's a situation at work, where I've been trying to bring some folks into this point of view - they can only see direct match - but I contend that a new/good info retrieval system should do more)

Wang, P., & Soergel, D. (1998). A Cognitive Model of Document Use during a Research Project. Study I. Document Selection. Journal of the American Society for Information Science, 49(2), 115-133
This was based on Wang's dissertation work - while she worked at a campus library for agricultural economics, she did searching using DIALOG. For these bunch, she had them read aloud and think aloud while they went through the results she retrieved to pick out the ones they wanted in full text. She recorded this and then coded it. From that she pulled out what document elements they looked at and how they selected documents. I mostly talk about this study in terms of pointing out the document elements that are important (like Engineering Village is spot on with the author and affiliation first), but the decision theory stuff is interesting too. In addition to topicality, their criteria include recency, authority, relationship to author (went to school with him), citedness, novelty, level, requisites (need to read Japanese), availability, discipline, expected quality, reading time...

I figured while I'm in the relevance section - onward! (with all the cooper, wilson, and kemp stuff... i'm not sure i get it so much.. i'm really not about tricky arguments or nuanced ... as in the Patrick O'Brian novels, I go straight at 'em - even when i read one of these and get completely unscrewed - 5 minutes later I'm confused again)

Cooper, W. S. (1971). A Definition of Relevance for Information Retrieval. Information Storage and Retrieval, 7(1), 19-37. DOI: 10.1016/0020-0271(71)90024-6
this pdf might be corrupted on ScienceDirect... I'll have to check from another machine - (no, it's fine from work). In the mean time I had to - dramatic sigh - get this out of my binder from the information retrieval doctoral seminar. Logical relevance has to do with topic appropriateness. It is the relationship between stored information and information need. Information need is a "psychological state" that is not directly observable - hope to express it in words, but that's not the same thing. The query is a first approximation of a representation of an information need. The request is what the system actually gets (is this sounding a bit like Taylor '68?). So when he's doing his own definition, he looks at a very limited situation - a system that answers yes or no questions. (here's where I get into trouble). He defines a premiss set for a component statement of information need as the group of system statements that are a logical consequence of the component (minimal means as small as possible). A statement is "logically relevant to (a representation of) and information need iff it is a member of some minimal premiss set." He later goes on to say that for topical information needs, you can create a component statement tree and get to something similar to Xiaoli & Dagobert's indirect topical relevance. Interestingly, his definition specifically doesn't include things like credibility and utility where other versions of relevance do, even while maybe only developing topical relevance.

Wilson, P. (1973). Situational relevance. Information Storage and Retrieval, 9, 457-471. doi:10.1016/0020-0271(73)90096-X
Wilson also notes the difference between psychological relevance - what someone does do, or does perceive to be relevant - and a broader view of logical relevance - something can be relevant whether or not the person noticed it. Wilson is interested in logical relevance. Within logical relevance, there's a narrower logical relevance (elsewhere direct) and evidential relevance. Something is evidentially relevant if it strengthens or adds to the argument/case. Situational relevance deals with things that are of concern or things that matter, not just things you're mildly interested in. Something is situationally relevant if, when put together with your entire stock of knowledge, it is logically or evidentially relevant to some question of concern. Something is directly relevant if it's relevant to something in the concern set and indirectly situationally relevant if it's relevant to something that isn't part of the concern set. Wilson's situational relevance is time sensitive and person sensitive - what is of concern depends on who you ask. Within all this there are preferences, degree, practicality, etc.

Kemp, D.A.(1974) Relevance, Pertinence, and Information System Development. Information Storage and Retrieval 10, 37-47.
In which we lead back to Kuhn again (all roads lead back to Kuhn and Ziman if you travel them far enough :) Kemp defines pertinence as a subjective measure of utility for the actual person with the information need, while relevance is something that can be judged more objectively, by others who can compare the expressed information request with the documents retrieved. He compares this to public vs. private knowledge (Ziman, and Foskett), denotation vs. connotation, semantics vs. pragmatics. Along the way, he provides a definition of informal vs. formal communication - but this is really much more complex now. His definition of informal is that it "does not result in the creation of a permanent record, or if it does, then that record is not available for general consultation" (p.40). Of course our informal communication may last well after you'd like it to and is certainly retrievable! His view is that pertinence is ephemeral - but I guess now we would say that it's situated.

Kwasnik, B. H. (1999). The Role of Classification in Knowledge Representation and Discovery. Library Trends, 48(1), 22.
(btw the scans of this in both EbscoHost and Proquest aren't so hot - they're legible, but a little rough) This is a classic article for a reason... like this paragraph

The process of knowledge discovery and creation in science has traditionally followed the path of systematic exploration, observation, description, analysis, and synthesis and testing of phenomena and facts, all conducted within the communication framework of a particular research community with its accepted methodology and set of techniques. We know the process is not entirely rational but often is sparked and then fueled by insight, hunches, and leaps of faith (Bronowski, 1978). Moreover, research is always conducted within a particular political and cultural reality (Olson, 1998). Each researcher and, on a larger scale, each research community at various points must gather up the disparate pieces and in some way communicate what is known, expressing it in such a way as to be useful for further discovery and understanding. A variety of formats exist for the expression of knowledge--e.g., theories, models, formulas, descriptive reportage of many sorts, and polemical essays.

Just sums up all of scholarly communication in a few sentences. "Classification is the meaningful clustering of experience" - and it can be used in a formative way while making new knowledge and to build theories. Then she describes different classification schemes:
Hierarchies have these properties: inclusiveness, species/differentia (luckily she translates that for us - is-a relationships), inheritance, transivity, systematic and predictable rules for association and distinction, mutual exclusivity, and necessary and sufficient criteria. People like hierarchical systems because they're pretty comprehensive, they're economical because of inheritance and all, they allow for inferences, etc. But these don't always work because of multiple hierarchies, multiple relationships, transivity breaks down, we don't have comprehensive knowledge, and other reasons.

Trees go through that splitting but there's not that inheritance of properties. Her examples include part-whole relationships as well as a tree like general - colonel - lt colonel... - private. Trees are good because you can figure out relationships, but they're kind of rigid and handle only one dimension.

Paradigms are matrices showing the intersection of two attributes ( really?). Hm.

Facet analysis - choose and develop facets, analyze stuff using the different facets, develop a citation order. These are friendly and flexible once you get going, but deciding on facets is difficult and then there might not be any relationships between the facets.

With all of these things, things get disrupted when perspective changes, or the science changes, or there are too many things that don't fit neatly into the scheme. The article stops kind of suddenly - but this really ties back to Bowker and Star who are much more comprehensive (well it's a book after all!) in how all of this ties into culture, but less detailed about how classifications work.

Thus completes the relevance section... back to diffusion of innovations (see separate post on Rogers) These articles were originally assigned by M.D. White, who was a guest speaker at our doctoral seminar. One of her advisees did her dissertation on the diffusion of electronic journals, good stuff. Dr White was on my qualifying "event" committee, but she has since retired so no luck in having her on my next couple.

Fichman, R. G., & Kemerer, C. F. (1999). The illusory diffusion of innovation: An examination of assimilation gaps. Information Systems Research, 10(3), 255-275

The point of this article is that for corporate IT innovations, there's a real difference between acquisition and deployment; that is, many companies purchase technologies that they never deploy. If you measure adoption by number of companies who have purchased, then you'll miss rejection and discontinuance which are actually very prevalent. This difference between cumulative acquisition and deployment is the assimilation gap. If you think of the typical s curve then a higher one (higher cumulative# acquired) is acquisition and a lower one deployment, the area between the two curves is the gap. You can draw a line at any time t and see the difference. The problem is that you have censoring - some firms still have not deployed at the end of the observation window. The authors use survival analysis for this, which enables them to use the data even with censoring, to look at median times to deployment, and to make statistical inferences about the relative sizes of two gaps

They suggest that reasons for this gap for software innovations in firms might be increasing returns to adoption and knowledge barriers to adoption. Returns to adoption means that the more other organizations have already adopted, the more useful the innovation will be. Reasons for this include network effects, learning from the experiences of others, general knowledge in the industry about the innovation, economies of scale, and industry infrastructure to support the innovation (p. 260). Managers might hedge their bets for innovations that haven't caught on yet - purchase them, but wait to see what others do before deploying. Sometimes technology that is immature is oversold - and this only becomes clear after purchase. Knowledge barriers can be managerial as well as technological. It might not be clear how to go about the deployment process.

The authors did a survey of 1,500 medium to large firms (>500 employees) located using an advertiser directory from ComputerWorld. At these companies the respondents were mid-level IT managers with some software development tools installed at their site. They had 608 usable responses - but they ended up using only 384 because they wanted larger firms (>5 it staff) who were assumed to be more capable of supporting these software innovations. Acquisition - first purchase of first instance; Deployment - 25% new projects using. For one tool there was a very small gap, but for another it was pretty large. They came up with median times to deploy and also what percentage of acquirers will probably never deploy (for one innovation 43%!). They compared these to a baseline from a Weibull distribution (in which 75% deploy in 4 years).

Answers to the survey questions supported the idea that complexity and the technologies being oversold really contributed to this gap. An alternate explanation is that different people in the organization make the acquisition and deployment decisions.

(I'm going to stop now and start on next week's... more diffusion to come)

Labels: comps

¶ 12:15 PM| (0) comments |cites (technorati) |

Tuesday, April 21, 2009

Ejournals and journal services: What is worth paying for?

*rant alert*
This post has been bubbling up for a while, but I'm finally taking time out to say it. (see a discussion about crossref and free cloud on J.R.'s site)

This is in response to
a) Statements by some that anyone can publish a journal (and do it well), that journal hosting services provide little or no value, and that stashing copies of articles anywhere in a random pdf format is just as good as publishing in a journal
b) The ICOLC Statement, which says in part:

1. Purchasers will trade features for price; that is, we can do without costly new interfaces and features. This is not a time for new products. Marketing efforts for new products will have only limited effects, if any at all.

Part of what we (libraries) pay for when we license electronic journals is:

an interface that allows browsing, searching, and known item retrieval (like if you can just put in a jnl name, v, p and get the answer)
an interface that does alerts
an interface that allows you to export metadata
an interface with extra features like similar articles, times cited, post to delicious
an interface that shows you what you have access to and what you don't
probably most importantly one that our machines will talk to so that we can use tools like open url resolvers (SFX) and metasearch (like metalib) to integrate into discovery platforms

AIAA and some other publishers have chosen to ignore most if not all of these requirements, and to strike out on their own - but we still subscribe because they're the only game in town. Some libraries are so cash-strapped, that they use aggregators for journal full text instead of using the journal platform. This limits the features available and the context provided for the article as well as frequently imposing an embargo on access (new articles are not available, articles are available after 12 months or so). (I choose to believe that they use aggregators because they're cash-strapped, not because they're too lazy to make individual subscription/platform decisions).

Publishers (like small societies) do not have to figure this out on their own - they join crossref, and they hire an organization like atypon or highwire or ingenta or even (eek) Elsevier or Wiley Blackwell to make their journals available.

It IS worth money to:

be standards compliant
to have a useful/useable web site that facilitates information discovery - we KNOW that scholars browse journal runs for information, and chain from article to the next, our platforms MUST support this or they are not useful!
be reliable!
tell us (librarians) what you're up to and offer us training on how to use your services
ask us (librarians & users) what we want/like/use
have a long term digital preservation plan

We DO NOT want to give publishers (like aiaa) and others more money to:

reinvent the wheel - to build their own site, from scratch, which is pretty but not useable or useful or standards compliant
lobby congress against things that we hold dear
hire lawyers to prevent us from doing what you have already licensed us to do
generally be evil

*done ranting, I feel better now, thank you*

Update, later that day: AIAA now has DOIs, thank goodness, but they still have issues. You could host your journal on BMC (if you are in Biomed!) or on some open journal service - not all of these are created equally! Your data export should be available in every format major bibliographic/citation managers take (ris, txt, endnote ,refworks, BibTeX...). Nice text and online as well as offline readability (how about readable html and readable pdf!)

¶ 9:36 AM| (1) comments |cites (technorati) |

Friday, April 17, 2009

Comps reading - Diffusion of Innovations

Very disappointing progress for assorted reasons, not related to the book itself.

Rogers, E. M. (2003). Diffusion of innovations. 5th ed. New York: Simon & Schuster.

This book is very readable and enjoyable. Not to mention that the author reinforces what he says by referring backward and forward in the book a lot and italicizing new terms and providing complete definitions. The book is also peppered with case studies pulled from the literature.

Diffusion of innovations research got its start in the study of diffusion of agricultural technologies in the Midwest. Rogers' dissertation in 1954 was on the diffusion of an insecticide in Iowa. He cites his own work from that time forward to work in press at the time of writing. This type of research is very popular in communication (and journalism), but also in marketing, epidemiology, medicine, sociology, anthropology, information science, international development, and elsewhere.

Diffusion is the process in which an innovation is communicated through certain channels over time among the members of a social system. (p.5)

Communication is a process in which participants create and share information with one another in order to reach a mutual understanding (p.5)

Rogers also defines innovation, information, and the other terms he uses. He starts with the elements of diffusion (innovation, channels, time, social system) and continues by discussing the history and criticisms of the research. It's not until chapter 4 that he gets to the generation of innovations. Chapter 5 is one of the more important chapters, covering the innovation-decision process.

For individuals, the innovation decision process is

knowledge > persuasion > decision > implementation > confirmation

Within knowledge, there are a few different kinds: awareness, how-to, and principles (how it works). Different types of knowledge are needed by different adopters. For early adopters awareness is most important. Persuasion is harder for preventive types of innovations (take vitamins), and there's often a knowledge-adoption gap. The decision can end in rejection of an invention. Implementation can include re-invention (or adopters adapting the innovation to their local circumstances). A higher degree of re-invention leads to faster adoption and greater sustainability. Even after all this is done there can be discontinuance.

Chapter 6 is about the attributes of the innovation and how they impact its diffusion. Basically these perceived attributes are important

relative advantage (benefits it has over other innovations or existing stuff, within the social system, as perceived by potential adopters)
compatibility (how does it fit with existing culture, power outlets, ways of doing business, etc)
complexity
trialability (can you give it a whirl before making some big commitment?)
observability (can you see people using it?)

This section also gets into incentives and mandates for adoption.

Chapter 7 is about properties of the adopter and adopter categories like innovators (2.5%), early adopters (13.5%), early majority (34%), late majority (34%), laggards (16%). There are a lot of different characteristics of early adopters including ability to deal with abstraction, rationality, intelligence, and exposure to mass media.

Those few chapters in the middle are the most important. Chapter 8 discusses diffusion networks and how homophily and heterophility come into play. Also about opinion leaders, how to find them, and what they do. Critical mass. Individual thresholds for adoption. Chapter 9 is about the change agent (someone who works to influence adoption decisions in the direction desired by the change agency).

Chapter 10 discusses innovation in organizations - and I always think this chapter will be most important, but it's really not much different from others. Internal characteristics of the organization

centralization
complexity (not what you think - it's how smart the employees are, their knowledge and expertise so that they can individually understand new innovations)
formalization
interconnectedness
organization slack (more time/money left over to spend on innovations)
size

Organizational innovation-decision proccess are slightly different:

agenda setting > matching > redefining structure > clarifying > routinizing

Chapter 11 discusses consequences of innovations. A lot of bad unintended consequences with things missionaries tried to do (giving steel axes to younger members of a society when the elders had had control of the tools - upset the apple cart). It's like we think this is all done, but I was hearing on the radio the other day about how this woman is campaigning against a lot of aid for Africa because it's making local businesses less competiive, it's enriching dictators, and it's not encouraging local development. I don't know for sure, but seems like we're still not looking at the consequences (intended and unintended, direct and indirect, desired and undesired).

I like this book a lot - but if you really want to get into this area, there are tons and tons of journal articles with more details. (I'll be re-reading a couple of these soon).

Labels: comps

¶ 7:23 AM| (0) comments |cites (technorati) |